77 research outputs found

    Temporal Feature Alignment in Contrastive Self-Supervised Learning for Human Activity Recognition

    Full text link
    Automated Human Activity Recognition has long been a problem of great interest in human-centered and ubiquitous computing. In the last years, a plethora of supervised learning algorithms based on deep neural networks has been suggested to address this problem using various modalities. While every modality has its own limitations, there is one common challenge. Namely, supervised learning requires vast amounts of annotated data which is practically hard to collect. In this paper, we benefit from the self-supervised learning paradigm (SSL) that is typically used to learn deep feature representations from unlabeled data. Moreover, we upgrade a contrastive SSL framework, namely SimCLR, widely used in various applications by introducing a temporal feature alignment procedure for Human Activity Recognition. Specifically, we propose integrating a dynamic time warping (DTW) algorithm in a latent space to force features to be aligned in a temporal dimension. Extensive experiments have been conducted for the unimodal scenario with inertial modality as well as in multimodal settings using inertial and skeleton data. According to the obtained results, the proposed approach has a great potential in learning robust feature representations compared to the recent SSL baselines, and clearly outperforms supervised models in semi-supervised learning. The code for the unimodal case is available via the following link: https://github.com/bulatkh/csshar_tfa.Comment: Accepted to IJCB 202

    Skeleton-based Human Action Recognition using Basis Vectors

    Get PDF
    Automatic human action recognition is a research topic that has attracted significant attention lately, mainly due to the advancements in sensing technologies and the improvements in computational systems’ power. However, complexity in human movements, input devices’ noise and person-specific pattern variability impose a series of challenges that still remain to be overcome. In the proposed work, a novel human action recognition method using Microsoft Kinect depth sensing technology is presented for handling the above mentioned issues. Each action is represented as a basis vector and spectral analysis is performed on an affinity matrix of new action feature vectors. Using simple kernel regressors for computing the affinity matrix, complexity is reduced and robust low-dimensional representations are achieved. The proposed scheme loosens action detection accuracy demands, while it can be extended for accommodating multiple modalities, in a dynamic fashion

    Unsupervised Interpretable Basis Extraction for Concept-Based Visual Explanations

    Full text link
    An important line of research attempts to explain CNN image classifier predictions and intermediate layer representations in terms of human understandable concepts. In this work, we expand on previous works in the literature that use annotated concept datasets to extract interpretable feature space directions and propose an unsupervised post-hoc method to extract a disentangling interpretable basis by looking for the rotation of the feature space that explains sparse one-hot thresholded transformed representations of pixel activations. We do experimentation with existing popular CNNs and demonstrate the effectiveness of our method in extracting an interpretable basis across network architectures and training datasets. We make extensions to the existing basis interpretability metrics found in the literature and show that, intermediate layer representations become more interpretable when transformed to the bases extracted with our method. Finally, using the basis interpretability metrics, we compare the bases extracted with our method with the bases derived with a supervised approach and find that, in one aspect, the proposed unsupervised approach has a strength that constitutes a limitation of the supervised one and give potential directions for future research.Comment: 15 pages, Accepted in IEEE Transactions on Artificial Intelligence, Special Issue on New Developments in Explainable and Interpretable A

    Being the center of attention: A Person-Context CNN framework for Personality Recognition

    Full text link
    This paper proposes a novel study on personality recognition using video data from different scenarios. Our goal is to jointly model nonverbal behavioral cues with contextual information for a robust, multi-scenario, personality recognition system. Therefore, we build a novel multi-stream Convolutional Neural Network framework (CNN), which considers multiple sources of information. From a given scenario, we extract spatio-temporal motion descriptors from every individual in the scene, spatio-temporal motion descriptors encoding social group dynamics, and proxemics descriptors to encode the interaction with the surrounding context. All the proposed descriptors are mapped to the same feature space facilitating the overall learning effort. Experiments on two public datasets demonstrate the effectiveness of jointly modeling the mutual Person-Context information, outperforming the state-of-the art-results for personality recognition in two different scenarios. Lastly, we present CNN class activation maps for each personality trait, shedding light on behavioral patterns linked with personality attributes

    Multimodal Fusion Based on Information Gain for Emotion Recognition in the Wild

    Get PDF

    The platformer experience dataset

    Full text link

    Towards multimodal player adaptivity in a serious game for fair resource distribution

    Get PDF
    We present an initial demonstrator towards the creation of an adaptive serious game for teaching conflict resolution. The overall aim is the development of a game which detects and models player in-game behaviours and cognitive processes and, based on these, automatically generates content that drives the player towards personalized conflict resolution scenarios.peer-reviewe

    Towards player’s affective and behavioral visual cues as drives to game adaptation

    Get PDF
    Recent advances in emotion and affect recognition can play a crucial role in game technology. Moving from the typical game controls to controls generated from free gestures is already in the market. Higher level controls, however, can also be motivated by player’s affective and cognitive behavior itself, during gameplay. In this paper, we explore player’s behavior, as captured by computer vision techniques, and player’s details regarding his own experience and profile. The objective of the current research is game adaptation aiming at maximizing player enjoyment. To this aim, the ability to infer player engagement and frustration, along with the degree of challenge imposed by the game is explored. The estimated levels of the induced metrics can feed an engine’s artificial intelligence, allowing for game adaptation.This research was supported by the FP7 ICT project SIREN (project no: 258453)peer-reviewe

    Does your profile say it all? Using demographics to predict expressive head movement during gameplay

    Get PDF
    In this work, we explore the relation between expressive head movement and user profile information in game play settings. Facial gesture analysis cues are statistically correlated with players' demographic characteristics in two different settings, during game-play and at events of special interest (when the player loses during game play). Experiments were conducted on the Siren database, which consists of 58 participants, playing a modified version of the Super Mario. Here, as player demographics are considered the gender and age, while the statistical importance of certain facial cues (other than typical/universal facial expressions) was analyzed. The proposed analysis aims at exploring the option of utilizing demographic characteristics as part of users' profiling scheme and interpreting visual behavior in a manner that takes into account those features.peer-reviewe

    The platformer experience dataset

    Get PDF
    Player modeling and estimation of player experience have become very active research fields within affective computing, human computer interaction, and game artificial intelligence in recent years. For advancing our knowledge and understanding on player experience this paper introduces the Platformer Experience Dataset (PED) - the first open-access game experience corpus - that contains multiple modalities of user data of Super Mario Bros players. The open-access database aims to be used for player experience capture through context-based (i.e. game content), behavioral and visual recordings of platform game players. In addition, the database contains demographical data of the players and self-reported annotations of experience in two forms: ratings and ranks. PED opens up the way to desktop and console games that use video from webcameras and visual sensors and offer possibilities for holistic player experience modeling approaches that can, in turn, yield richer game personalization.peer-reviewe
    • …
    corecore